Authorship analysis based on data compression
نویسندگان
چکیده
منابع مشابه
Authorship analysis based on data compression
6 This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a similarity measure based on compression with dictionaries directly extracted from the written texts. The FCD computes a similarity between two documents through an effective binary search on the intersection set between the two related dictionaries. In the reported experiments the proposed method i...
متن کاملAuthorship Attribution based on Data Compression for Telugu Text
Authorship attribution (AA) can be defined as the task of inferring characteristics of a document's author from the textual characteristics of the document itself. In this paper we evaluated the compression model for AA on Telugu text. We considered six different compressors namely Zip, BZip, GZip, LZW, PPM and PPMd in combination with three different compression distance measures such as ...
متن کاملAuthorship Verification based on Compression-Models
Compression models represent an interesting approach for different classification tasks and have been used widely across many research fields. We adapt compression models to the field of authorship verification (AV), a branch of digital text forensics. The task in AV is to verify if a questioned document and a reference document of a known author are written by the same person. We propose an in...
متن کامل1 4 Fe b 20 14 Authorship Analysis based on Data Compression
6 This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a similarity measure based on compression with dictionaries directly extracted from the written texts. The FCD computes a similarity between two documents through an effective binary search on the intersection set between the two related dictionaries. In the reported experiments the proposed method i...
متن کاملAuthorship Attribution using Compression Distances
Authorship attribution has been a field of interest for researchers in the past, especially for forensic purposes. In this thesis, to obtain the degree of Bachelor of Science from the Leiden University, we investigate character n-grams and so-called compression distances to prototypes on several datasets, i.e., the datasets provided by PAN Labs (a benchmarking activity on uncovering plagiarism,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition Letters
سال: 2014
ISSN: 0167-8655
DOI: 10.1016/j.patrec.2014.01.019